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Me, myself and | 


Developer 


Developer advocate 
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Data Residency 


“Data localization or data residency law 
requires data about a nation’s citizens or 
residents to be collected, processed, and/or 
stored inside the country, often before 
being transferred internationally. Such data 
is usually transferred only after meeting 
local privacy or data protection laws, such 
as giving the user notice of how the 
information will be used and obtaining their 
consent.” 


-- https://en.wikipedia.org/wiki/Data_localization 


Data Residency Flavors 


= Legal Requirements 


= Jurisdictional challenges 
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Legal Requirements in 2023 
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Scope 


Australia Health records 

Canada Public service providers: all personal data 

China Personal, business, and financial data 

Germany Telecommunications metadata 

India Payment System Data 

Indonesia Public services companies must maintain data centers in country 
Kazakhstan Servers running on the country domain (.kz) 

Nigeria All government data 


Russia All personal data 
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Jurisdictional challenges: Patriot Act & USA Cloud Act 


“[...| primarily amends the Stored 
Communications Act (SCA) of 1986 to allow 
federal law enforcement to compel U.S.- 
based technology companies via warrant 
or subpoena to provide requested data 
stored on servers regardless of whether the 
data are stored in the U.S. or on foreign 
soil” 
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-- https://en.wikipedia.org/wiki/CLOUD_Act MR 
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Cloud Providers and Data Sovereignty 


= Google: 

° No data center in Malaysia 
a AWS: 

° No data center in Malaysia 
= Azure: 

e No data center in Malaysia 


e “Coming soon” (June 2023) 
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Data Partitioning 


“A partition is a division of a logical 
database or its constituent elements into 
distinct Independent parts. Database 
partitioning is normally done for 
manageability, performance or availability 
reasons, or for load balancing. It is popular 
in distributed database management 
systems, where each partition may be 
spread over multiple nodes” 


-- https://en.wikipedia.org/wiki/Partition_(database) 
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Partitioning criteria 


= Round-robin partitioning 
= (Consistent) Hash partitioning 


= Range partitioning 


= List partitioning 


= Composite partitioning 
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Data Partitioning in the wild 


= Most of data partinioning is for 
horizontal scaling 


e Consistent hashing 
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Partitioning criteria 


= List partitioning 


= Composite partitioning? 
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A simple two-locations architecture 


Location X 
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Our very complex algorithm 


'us' THEN 
Location 'X' 


ELSE IF country = 'fr' THEN 


IF country 


Location = 'Y' 


Where to put the algorithm 


a In the client? 
= In the app? 


° In a lib/framework? 


= In a proxy? 
= In the API Gateway? 
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Client-side algorithm 


“Smart” client 
Need to know the topology 


Doesn't work on the first 
request 


Server-side algorithm 


= Most flexible approach 


° Allows fetching additional 
data to compute location 


= Requires code 


e Most error prone 


= Performance cost of crossing 
location 


W @nicolas frankel 


Proxy-based algorithm 


= More complex architecture 
° One more moving piece 


= More decoupled 


= Same performance cost of 
crossing location 
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API Gateway-based algorithm 


= Early decision 


= Gateway already knows the 
upstreams 


= No access to the data 
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A tentative proposal: double computation 


The first request hits any of the two endpoints 
The response sends additional metadata to store client-side 


The second request queries the correct Gateway 


~~ Ye 


The Gateway computes the location based on the metadata and forwards it to 
the correct app 


5. The app computes the location again 


1. If it’s correct, continue to the database She din 


2. If it’s not, forward to the correct database implemented client- 
AND server-side 
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One possible implementation with the Apache Stack 


SPISIX  SharaingSphere 
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Apache APISIX, an API Gateway the Apache way 


Apache APISIX: Software Architecture 


[=] Observability — Traffic Management 


APISIX Plugin Runtime 


APISIX Core 


OpenResty 


Nginx 
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Apache Shardingsphere 


“Apache ShardingSphere is an 
ecosystem to transform any 
database into a distributed 
database system, and enhance 
it with sharding, elastic scaling, 
encryption features & more.” 


-- https://shardingsphere.apache.org/document/current/en/overview/ 


W @nicolas frankel 


W @nicolas frankel 


Architecture, limitations and choices 


No client storage for metadata 
Single API Gateway 
Two apps 
Two databases 
JVM-based app 
Kotlin 
Spring Boot 
Shardingsphere via library 


» 


Demo set up 
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Thanks for your attention! 
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@nico@frankel.ch 


Nttps://bit.ly/dataresid 


GAME bad: 


https://apisix.apache.org/ 


